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The purpose of this research was to assess the 
criterion-referenced validity of student ratings of instructors, A 
total of 480 undergraduates rated their instructors using a special 
rating scale designed to parallel the Flanders Interaction Analysis 
Categories. Expert observers also rated the instructors using the 
standard form of the Flanders Categories. Mean student ratings for 
instructors were correlated with expert observers' scores. 
Significant correlations were found between ratings for four 
categories. These results were interpreted as revealing some 
criterion-referenced validity for student ratings. (Author) 
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Research on the reliability of student ratings of 
instruction indicates that students are indeed reliable raters 
of their instructors. Reliability cceficients range from 
moderately positive to high positive correlations (McKeachie ^ 
1969) • However, very little research has been reported on the 
validity of student ratings of instruction. - 

Most researchers and users of student ratings of instruction 
are satisfied x^ith face validity of tne instruments if the 
content of items seems to focus on significant aspects of 
instruction (Remmers, I963). Studies of the construct 
validity of student rating forms througn factor analysis 
have been only moderately successful in identifying replicable 
and interpretlble components of teacher behavior (Derry, 197?). 
A number of researchers have also assessed the concurrent or 
predictive validity of student ratings of instruction by 
correlating student ratings with ratings of the same instructors 
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by alumni (Drucker and Remmers, 1951), colleagues (Guthrie, 
193^^: Maslow and Zimmernan, 1956), and supervisors (Costin 
et. al. 1971; Hayes, 1971). Substantial agreement amonR; different 
groups has been founa. 

Pei-haps tne ideal v/ay to deal with the validity problem 
and to assess tne accuracy of students' ratings was proposed 
by Halstead, Feldhusen and [/rcD aniej. (i j'/' o). — They" suggested 
that rating of instruction be done by expert observers and 
the results compared with student ratings. Like the studies of 
concurrent and predictive validity, this is an evaluation of 
criterion-referenced validity. This approach was used in 
the present study. The questions i/ere stated as follov/s: 
Are student and teacher verbal behaviors as observed by 
professional observers correlated with ratings of these same 
behaviors by the students themselves? Are there significant 
differences between student and expert observers in the amount 
of each type of behavior observed? 

Methods 

Subjects 

Eighteen instructors, twelve males and six females, 
and H88 undergraduate students enrolled in eight educational 
psychology classes, eight general psychology classes, and 
two sociology classes v;ere the subjects for this study. These 
sections were taught by five instructors and fourteen graduate 



3 

and teaching assistants. Approximately one third of tne students 
were males and two thirds were females. Students ranged from 
freshman to seniors in college. Tne numoer of students in 
classes ranged from l8 to hh with a mean of 27.3. 

Procedures : 

- — ^---J^^nciers_Xnt^^ct1 on Anal ys45- Categories (FIAC; PlandeTs^ " 

1970) was used to assess student-teachei- verbal interactions. 
Tv;o trained observers visited the classes and observed and 
recorded the interactions. Inter-rater reliability was .85. 

The following teacher behaviors and interactions were 
assessed: (1) acceptance of feelings, (2) praise and encourage- 
ment, (3) use of student ideas, (^4) asking questions, (5) lecturing, 
(6) giving directions, (7) criticizing,. (8) student talk - response, 
and (9) student talk - initiation. 

To obtain student ratings of teacher behavior emd student- 
teacher interactions, an Interaction Analysis Questionnaire 
(lAQ) was developed and administered to the students (Touq, 
1972). This questionnaire consists of nine items representing 
student and teacher verbal behaviors parallel to the first 
nine categories of the PIAC (Flanders, 1970). Test-retest 
reliability was found to be .75. 

Scores on both the FIAC and the lAQ were percentages of 
classroom time spent in each of the nine types of behavior. 



Frequencies of the FIAC were then correlated with 
student ratings of instructors on the lAQ for the parallel 
categories. Alpha was set at .10. Differences betv/een means 
for each category on FIAC and lAQ v/ere evaluated with a t 
test for correlated means with alpha equals .05. 

Results 

Table 1 shows the means of student ratings of classroom 
interaction activities for all the classes involved in this 
study and the assessments of the same activities utilizing the 
FIAC. Table 1 also gives the correlations between the lAQ mean 
scores and the FIAC scores. Four correlations out of nine 
were significant (.^3. .49» .44, ana .61) with a fifth correlation 
approaching significance (.36). "Accepting feelings" on the 
FIAC had a significant and negative correlation with the same 
category on tne lAQ (-.43). "Praising and encouraging'"' on the 
FIAC had a significant and positive correlation with the same 
category on the lAQ (.49). ^^Lecturing'" on the FIAC had a 
significant and positive correlation with the same category on 
the lAQ (.44). Student talk ^ initiation'* on the FIAC had a 
significant and positive correlation with the same category on the 
lAQ (.61). Correlation between ''Student talk - response^' on the 
FIAC and the lAQ also approached significance (.36). 

Differences between the means for each parallel category 
on FIAC and lAQ were tested using the t test for correlated 



means (V/iner, 1971) and an alpha level of .05. The results 

indicate tnat the differences were significant for seven out 

of the nine means. These were "accept inr? feelings*' (t = l^.SS)^ 

"praising or encouraging" (t = 1^.97), ''accepting iaeas- 

(t = 8.69), "lecturing^' (t = 7.06), "giving directions" (t = 2.29)5 

"criticizing or justifying authority'^ (t = 3.59)3 "student talk - 

response'" (t = 9.05). The differences between means of the FIAC 

and the lAQ were not significant for "asking questions" (t = 1.5^) 

and ''student talk - initiation" (t = 1.45). 

Discussion 

The first question asked in this research was: Are student 
and teacher verbal behaviors as observed by professional observers 
correlated with ratings of these same teacher behaviors by 
the students themselves? The answer is affirmative. Three 
significant and positive correlations were found. One, the 
correlation between FIAC and lAQ ^'student talk • initiation," 
was .61. The other significant ores were "praising or encouraging" 
(r = .49) and "lecturing" (r = .44). The correlation for 
category 1, "accepting feelings" (-.43) was significant and 
negative . 

The second question was: Are there significant differences 
in the amount of each type of behavior observed between student 
and expert observers? Significant differences were found for 
"accepting feelings", "praising or encouraging*^ /'accepting ideas'*, 
"lecturing", "giving directions", "criticizing or justifying 
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authority', student talk - response. The differences were 
not significant for "asking questions" and student talk - 
initiation. The means of the lAQ categories were all larger than 
the means of the FIAC categories except for category five 
(lecturing) where the mean of the FIAC was larger than the mean 
of the lAQ. 

The^correj^at^o^^^ study indicate some agreements' 

between students and expert observers with regard to instructors' 
classroom behaviors. Thus, there is moderate support for 
the criterion-referenced validity of student ratings of 
instruction. Of particular significance is students' perceptions 
of their ovm behavior. Students were most accurate in assessing 
their own initiated talk in classroom. The correlations with 
expert observers was .60 and there was no difference between the 
FIAC and lAQ means. The fact that the correlation for ''-student 
talk - response "was not significant and the difference between 
FIAC and lAQ means was so great might be due to some confusion 
on the part of the students in making differentiation between 
initiated talk and talk in response to a question. 

Of particular interest is the significant negative correlation 
for Category 1, "accepting feelings'', between the FIAC and the lAQ. 
This is coupled with the large difference between means. Students 
see much more of this behavior than observers. Perhaps the students 
are rating on the basis of out-of-class teacher behaviors. But 
this still leaves open the question of the negative correlation. 



It is possible to speculate that the teacher who snows little 
acceptance of student feelings in class shows much in personal 
conferences in his office. Conversely the teacher who demonstrates 
acceptance of student feelings in class shows no such acceptance 
in personal contacts and thus is rated down by students. 

A number of researchers have indicated that student ratings 
are valid when they are evaluated against <y.fferent criteria 
such as alumni, colleagues, and supervisors (Drucker and 
Remmers, 1951; Guthrie, 1954; Costin, et. al. 1971; Maslow and 
Zimmerman, 1956; Clark and Blackburn, 1971: and Hayes, 1971). 
Thus, the results of this study add more support for the findings 
of these researchers. However, the approach of this study to 
criterion-referenced validity is unique and probably more 
important thanthe other approaches because outside professional 
observers have no personal stake in the educational process 
that might bias their ratings and because they are knowledgeable 
about instruction. 

Higher correlations might be obtained if there was some 
assurance that the students understood the specific behaviors 
they were rating. The subjects of this study were not previously 
exposed to either the PIAC or its parallel form the lAQ. Training 
students on these scales might increase the accuracy of their jaq 
ratings^ Halstead, Peldhusen and McDaniel (1970) pi»oposed such 
a procedure. Halstead (1972) carried out research which was 
partially successful in improving the reliability of student 
rating through training the students in the rating procedures. 
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Summary 



The purpose of this research was to assess the criterion- 
referenced validity of student ratings of instructors. A 
total of 480 undergraduates rated their instructors using a 
special rating scale designed to parallel the Flanders Interaction 
Analysis Categories, i-xpert observers also rated the instructors 
using the standard form_o£^the Flanders Categories. Mean student 
ratings for instructors were correlated with expert observers* 
scores. Significant correlations were found between ratings for 
four categories. These results were interpreted as revealing 
some criterion-referenced validity for student ratings. 



Table 1 



Means and Standard Deviations 
For FIAC and lAQ Categories 



FIAC 



WKU 



Category Mean 



Standard 
Deviation Mean 



Standard 

Deviation Correlation 



(1) Accepting feelings 0.08 

(2) Praising or encouraging 1.69 

(3) Accepting ideas 3.37 

(4) Asking questions 9.98 

(5) Lecturing 59.52 

(6) Giving directions 1.23 

(7) Criticizing 0.l\2 

(8) Student talk - response 4.37 

(9) Student talk - initiated 14.54 



.17 
1.18 
2.39 
13.97 
25.62 
1.34 
1.48 
3.55 
17.32 



11.70 
8.27 
8.28 
8.96 

36.27 
3.98 
1.66 

11.44 
9.37 



3.70 
2.01 
2.75 
■3.13 
12.29 
5.99 
1.41 

3.11 
3.10 



.43* 

.49* 

.16 

.01 

.44* 

.12 

.08 

.36 

.61* 



^Significant 
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